Probe Selection Algorithms with Applications in the Analysis of Microbial Communities ( Extended

نویسندگان

  • James Borneman
  • Marek Chrobak
  • Gianluca Della Vedova
  • Andres Figueroa
  • Tao Jiang
چکیده

James Borneman 1, Marek Chrobak 2, Gianluca Della Vedova 3, Andres Figueroa 2 and Tao Jiang 2 1Department of Plant Pathology, University of California, Riverside, CA, 92521, USA, 2Department of Computer Science, University of California, Riverside, CA, 92521, USA and 3DISCo, Università degli Studi di Milano-Bicocca, Milano, 20126, Italy ABSTRACT We propose two efficient heuristics for minimizing the number of oligonucleotide probes needed for analyzing populations of ribosomal RNA gene (rDNA) clones by hybridization experiments on DNA microarrays. Such analyses have applications in the study of microbial communities. Unlike in the classical SBH (sequencing by hybridization) procedure, where multiple probes are on a DNA chip, in our applications we perform a series of experiments, each one consisting of applying a single probe to a DNA microarray containing a large sample of rDNA sequences from the studied population. The overall cost of the analysis is thus roughly proportional to the number of experiments, underscoring the need for minimizing the number of probes. Our algorithms are based on two well-known optimization techniques, i.e. simulated annealing and Lagrangian relaxation, and our preliminary tests demonstrate that both algorithms are able to find satisfactory probe sets for real rDNA data. Contact: [email protected] propose two efficient heuristics for minimizing the number of oligonucleotide probes needed for analyzing populations of ribosomal RNA gene (rDNA) clones by hybridization experiments on DNA microarrays. Such analyses have applications in the study of microbial communities. Unlike in the classical SBH (sequencing by hybridization) procedure, where multiple probes are on a DNA chip, in our applications we perform a series of experiments, each one consisting of applying a single probe to a DNA microarray containing a large sample of rDNA sequences from the studied population. The overall cost of the analysis is thus roughly proportional to the number of experiments, underscoring the need for minimizing the number of probes. Our algorithms are based on two well-known optimization techniques, i.e. simulated annealing and Lagrangian relaxation, and our preliminary tests demonstrate that both algorithms are able to find satisfactory probe sets for real rDNA data. Contact: [email protected] Microorganisms are of fundamental importance for agriculture, biotechnology and medicine. However, to fully manage and utilize this resource, a thorough understanding of these organisms and their communities is needed. Current estimates suggest that thousands of different microorganisms inhabit most environments, the vast majority of which have not yet been described because they do not grow on artificial media (Amann et al., 1995; Ward et al., 1992). Recent studies of microbial communities have been assisted by the development of ribosomal RNA (rRNA) gene analyses, which have eliminated the need to culture these organisms and led to the identification of thousands of previously undescribed microorganisms (Barns et al., 1994; Giovannoni et al., 1990; Pace, 1997). rRNA genes (rDNAs) are useful taxonomic indicators because they are found in all known organisms, contain both highly conserved and variable regions, and have a slow but relatively constant molecular clock or mutation rate (Woese, 1987). Analysis of microbial communities using rRNA genes can be done using several simple approaches. The most commonly used methods include Denaturing Gradient Gel Electrophoresis (DGGE) (Muyzer et al., 1993) and Terminal Restriction Fragment Length Polymorphisms (T-RFLP) (Liu et al., 1997), both of which allow analysis of many samples in a relatively short time period. Unfortunately, they also produce limited data sets as communities that may contain thousands of different species (Torsvik et al., 1990) are resolved into approximately 10 to 30 groups. To obtain comprehensive depictions of community structure, investigators can use extensive sequence analysis of rDNA clone libraries. In two such studies, hundreds of bacterial rDNA clones from several soils were analyzed, no duplicates were found and none had been previously described (Borneman et al., 1996; Borneman and Triplett, 1997). Due to this remarkable diversity and to the high cost of DNA sequencing, this approach to is not feasible with current technology. The goal of our research is to develop a high-throughput approach for the examination of microbial communities. To accomplish this, we are adapting an existing strategy termed oligonucleotide fingerprinting that permits the identification of thousands of cDNA clones (Drmanac, 1999; Drmanac and Drmanac, 1994; Drmanac et al., 1996; Maier et al., 1994; Meier-Ewert et al., 1998). After the rDNA clone libraries are constructed, the clones are classified by individual hybridization experiments on DNA microarrays with a series of short DNA oligonucleotides into clone types or operational taxonomic units (OTUs). Once classified, the nucleotide sequence of representative clones from each OTU can then be obtained by DNA sequencing to provide phylogenetic descriptions of the microorganisms. One of the key features of this strategy is that after a comprehensive database, that correlates hybridization patterns with nucleotide sequence data, has been compiled, little additional rDNA clone sequencing c © Oxford University Press 2001 1

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Overlapping Communities in Real-world Networks Based on Extended Modularity Gain

Detecting communities plays a vital role in studying group level patterns of a social network and it can be helpful in developing several recommendation systems such as movie recommendation, book recommendation, friend recommendation and so on. Most of the community detection algorithms can detect disjoint communities only, but in the real time scenario, a node can be a member of more than one ...

متن کامل

Probe selection algorithms with applications in the analysis of microbial communities

We propose two efficient heuristics for minimizing the number of oligonucleotide probes needed for analyzing populations of ribosomal RNA gene (rDNA) clones by hybridization experiments on DNA microarrays. Such analyses have applications in the study of microbial communities. Unlike in the classical SBH (sequencing by hybridization) procedure, where multiple probes are on a DNA chip, in our app...

متن کامل

Comparison of Simulated Annealing and Electromagnetic Algorithms for Solution of Extended Portfolio Model

This paper presents two meta-heuristic algorithms to solve an extended portfolio selection model. The extended model is based on the Markowitz's Model, aiming to minimize investment risk in a specified level of return. In order to get the Markowitz model close to the real conditions, different constraints were embedded on the model which resulted in a discrete and non-convex solution space. ...

متن کامل

Improving of Feature Selection in Speech Emotion Recognition Based-on Hybrid Evolutionary Algorithms

One of the important issues in speech emotion recognizing is selecting of appropriate feature sets in order to improve the detection rate and classification accuracy. In last studies researchers tried to select the appropriate features for classification by using the selecting and reducing the space of features methods, such as the Fisher and PCA. In this research, a hybrid evolutionary algorit...

متن کامل

Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection

Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001